How does a protein search for the specific site on DNA: the role of disorder 
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Proteins can locate their specific targets on DNA up to two orders of magnitude faster than 
the Smoluchowski three-dimensional diffusion rate. This happens due to non-specific adsorption of 
proteins to DNA and subsequent one-dimensional sliding along DNA. We call such one-dimensional 
route towards the target "antenna". We studied the role of the dispersion of nonspecific binding 
energies within the antenna due to quasi random sequence of natural DNA. Random energy profile 
for sliding proteins slows the searching rate for the target. We show that this slowdown is different 
for the macroscopic and mesoscopic antennas. 



A protein binding to a specific site on DNA, which we 
call the target, is one of the central paradigms of biol- 
ogy . Well known examples include Zac-repressor in E. 
coli, which regulates a specific gene producing enzyme 
consuming lactose and the proper restriction enzyme de- 
stroying genome of invading E. coli A-phage in real time 
warfare for bacteria survival. It is known since the early 
days of molecular biology that in some cases proteins 
can find their target sites along a DNA chain one to two 
orders faster than the maximum rate achievable by three- 
dimensional diffusion To resolve this paradox, non- 
specific binding and subsequent one-dimensional sliding 
of proteins along the DNA to the target was suggested as 
an important component of the searching process 0, Q . 
This idea was studied in various models proposed by both 
physicists and biologists 0, H, S 13 • A comprehensive 
study of interplay between the ID sliding and 3D diffu- 
sion for different DNA conformations on the search rate 
can be found in Ref. [fj. 

Some authors calculate the typical time r needed for 
the target site to be found by a protein, when a small con- 
centration c of proteins is randomly introduced into the 
system. Other authors Q consider the specific site as a 
sink consuming proteins with the diffusion limited rate J 
proportional to the concentration c (which in turn should 
be supported on a constant level by an influx of proteins 
into the system) . Obviously then, r = 1/ J. Search rate 
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FIG. 1: Distribution of nonspecific adsorption energies e and 
of chemical potential ^(x) along DNA molecule. The target 
site is located at x = 0, A is the antenna length. 



enhancement due to the sliding along DNA may be cal- 
culated as the ratio of the rate J to the 3D Smoluchowski 
rate J s = AnD^cb of diffusion to the sphere of radius b 
modeling the target site on DNA. The central physical 
idea is that one can define a piece of DNA adjacent to 
the target for which ID sliding diffusion dominates over 
parallel 3D diffusion channel and which, therefore, serves 
as a receiving antenna for the 3D Smoluchowski-like dif- 
fusion of proteins. Then the key point of the theory is to 
find the antenna length A. In the language of stationary 
flux J, this is done by matching incoming 3D flux J3 of 
proteins to the antenna with the ID flux J\ of proteins 
sliding on the antenna toward the target. 

All the cited above works assume that the nonspe- 
cific adsorption energy w of protein is sequence indepen- 
dent, i.e. the energy profile experienced by the searching 
protein away from the target is totally flat. This how- 
ever disagrees with quasi-random character of the natu- 
ral sequences of DNA. It is known that the nonspecific 
protein-DNA adsorption energy can be divided into two 
parts ESEl: (i) The sequence independent Coulomb en- 
ergy of attraction between the positively charged domain 
of the protein surface and the negatively charged phos- 
phate backbone, and (ii) the sequence specific adsorption 
energy due to formation of hydrogen bonds of the pro- 
tein with the DNA bases. This is done by the recogni- 
tion a-helix going deep into the major groove of DNA 
Suppose the protein encounters I base pairs between po- 
sitions i and i + l. We call this position of the protein site 
i and characterize it by energy Ci < 0, where the energy 
of the free protein in water is chosen to be 0. Because the 
sliding protein has a complex nonuniform structure and 
interacts with a random DNA sequence, the total energy 
Ci randomly fluctuates along DNA (Fig. One can 

assume that at nonspecific positions on DNA, the pro- 
tein exploits the same set of potential hydrogen bonds 
it forms with the target |12J. Since target recognition is 
often mediated by hydrogen bonds to some of the four 
chemical groups on the major groove side of the base 
pair [l3j ]. and the recognition a-helix interacts with sev- 
eral base pairs, many hydrogen bonds contribute to ej. 
Therefore the distribution of ej can be approximated by 
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the Gaussian distribution 0, Q, ^| with a mean w and 
standard deviation cr <C lid: 



V2^ 



: exp 



2ct 2 



(1) 



In this paper we study a role of disorder on the rate 
enhancement J / J s assuming that disorder is strong, i.e. 
a > kT, where k is the Boltzmann constant and T is the 
ambient temperature. 

Similar to the the case of the flat energy profile ||, 
we assume that transport outside the antenna is mainly 
due to the 3D diffusion, while inside the antenna trans- 
port is dominated by sliding, or ID diffusion along DNA 
and we equate the fluxes J\ and J3 to find A. The rate 
J3 is given by the Smoluchowski formula for the target 
size A and for the concentration of "free" (not adsorbed) 
proteins C3, it is J3 ~ D3C3A. The flux on antenna J\ 
strongly depends on a and also, generally speaking, on 
DNA sequence in the finite antenna. We show that there 
is a characteristic length of antenna A = A c (cr, T) such 
that at A > A c flux Jj self-averages and becomes se- 
quence independent. Such a "macroscopic" antenna de- 
termines J/J s for moderate disorder. In this case, the 
ratio J/J s decreases exponentially fast with growth of 
disorder. At stronger disorder we deal with a mesoscopic 
antenna with A < A c and strictly speaking J / J s depends 
on random DNA sequence. In this paper, we concentrate 
only on the most probable value of J/J s . In order to cal- 
culate it, we estimate the most probable value of J\. We 
show that in such a mesoscopic situation disorder leads 
to a weaker reduction of J/J s . 

We assume that within some volume v there is a 
straight, immobile (double helical) DNA with the length 
L smaller than w 1 / 3 , but much larger than any antenna 
length. For a dilute DNA solution, 1/v stands for the 
concentration of DNA. We also assume that all the mi- 
croscopic length scales such as the length of a base pair, 
the size of the target site, the diameter of the DNA etc. 
are of the same order b. We are mainly interested in scal- 
ing dependence of the rate enhancement J / J s on major 
system parameters, such as a, w, L and v. This means 
that all the numerical coefficients are dropped in our scal- 
ing estimates. 

To estimate J±, we assume at each site i on DNA, 
the protein has some probabilities of hopping to near- 
est neighboring sites j. We write the probability for the 
hopping from an occupied site i to an empty site j as 



Hi 



exp 



2kT 

v exp(-^p-) if ej > Ci 
v if 6j < e l 



(2) 



where vq ~ Di/b 2 is the effective attempt frequency. In 
Eq. (J2J we neglected the activation barriers separating 



two states in comparison with ej — ei. The number of pro- 
teins making such transition from site i to j per unit time 
can be estimated by r y - = 7y/,(l — fj), where function 
fi is the average occupation number of site i. At small 
enough c, all fi -C 1 and thus Ty ~ 7ij/i- Function is 
given then by: 



fi = exp[-(e l - m)/kT], 



(3) 



where \ii is the chemical potential. Using Ty and Tji, we 
can write the net flux from site i to j in the form: 



Jij = r.y - r M ~ v e *t ( e feT - e ^ ), 



(4) 



where e.y = max{ei,€j}. 

We now argue that as long as the antenna is only a 
small part of the DNA molecule, every protein adsorbs 
to DNA and desorbs many times before it locates the 
target. Therefore, outside the antenna there is statis- 
tical equilibrium between adsorbed and desorbed pro- 
teins, and hence proteins have uniform chemical potential 
Hi = h — kThi^c^b 3 ). Within the antenna, [ii decreases 
when the site approaches the target and reaches —00 at 
the target site (see Fig. QJ. If we label the border of the 
antenna as site I and the target as site X/b+ 1, using Eq. 
ijijl. we can write 



A/6 
i=l 



e S-) = VoC3 b 3 , 



(5) 



where j = i + 1. Since the ID current J± towards the 
target is the same at any antenna site, i. e. Jij = J±, we 
can find it as 



Ji 



I/qC36 3 v27 



E-ii exp(ey/fcT) (A/6) <l< ,, />';< ,,; : 
where R{tij) is given by 



(6) 



R(eij) = V2Tro- 2 g(eij)exp(e i3 /kT) 



exp 



(7) 



2{kTf 



+ 



w [e i:j -{w + a 2 /kT)f 
kT 2^2 



One can interpret Eq. JJjJ as the Ohm's law, where the 
numerator plays the role of the voltage applied to antenna 
and denominator is the sum of resistances of all pairs 
which are similar to Miller- Abrahams resistances 
for the hopping transport of electrons |16| . 

The sharp maximum value of function i?(e y ) deter- 
mining the sum of Eq. JHJ) is reached when ejj = e op t — 
w + a 2 /kT, and R(e opt ) ~ exp[a 2 /2(fcT) 2 + i«//cT]. Thus 



T Dscsb 2 
J\ ~ — - — exp 



kT 2{k,Tf 



(8) 



where we assumed for simplicity that D 3 = D\ ~ b 2 VQ. 
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Before we move forward, we emphasize the crucial as- 
sumption already made in above derivation. We assumed 
A is so long that within the antenna the sliding protein 
encounters sites with energy e opt more than once and 
therefore, the sum in Eq. © can be replaced by the in- 
tegral with limits from — oo to 0. We call such antenna 
macroscopic. For a short antenna, the probability for 
such a site to appear inside is very small. Thus the sum 
in Eq. JSJ is determined by the largest value of R(eij) 
typically available within the antenna. We call such an- 
tenna mesoscopic. 

Macroscopic antenna — We study macroscopic antenna 
first. Using Jj and J 3 , our main balance equation for the 
rate J reads 



J ~ D3C3X 



D 3 c 3 b 2 
A 



■ exp 



kT 2(kTf 



(9) 



Thus the antenna length A is obtained as 
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FIG. 2: (color online) The phase diagram of scaling regimes 
for \w\ > > kT. Each line marks a smooth crossover be- 
tween scaling regimes. The red line \w\ = 3a 2 /2kT marks the 
border 1 between macroscopic regimes (A, B) and mesoscopic 
regimes (C, D). The blue line \w\ = kTln(v/Lb 2 ) - a 2 /2kT 
marks the border 2 between weak and strong adsorption 
regimes. They intersect at a = kT [(1/2) ln(v/ Lb 2 )] 1/2 , 
\w\ = kT(3/4)\n(v/Lb 2 ). 



A ~ b exp 



2kT A(kTf 



(10) 



Next we calculate the free protein concentration 
c 3 . Suppose the one-dimensional concentration of non- 
specifically adsorbed proteins is c\. Assuming the an- 
tenna is only a small part of the DNA and remembering 
that adsorbed proteins are confined within distance of or- 
der b from the DNA, we can write down the equilibrium 
condition as: 



c 3 b 2 



f{e)e-*> kT de 



exp 



M 
kT 



2(kTf 



(11) 



which must be complemented by the particle counting 



condition c\L + c 3 (u — Lb ) 



Since volume fraction 



of DNA is always small, Lb 2 <C v, standard algebra then 
yields 



c 3 



yLb 2 



if y < V/Lb " , (12) 



cv/Lb y if y > v/Lb 



where y is exp[\w\/kT + a 2 /2(kT) 2 }. Eqs. JTJJ lead to 
two different scaling regimes, which are denoted as A and 
B in the diagram Fig. [21 In regime A, the non-specific 
adsorption is relatively weak, c 3 ~ c, we arrive at 



J 

— ~ exp 

>J S 



2kT A(kTf 



(regime A) (13) 



In the regime B, most proteins are adsorbed. Using the 
lower line of Eqs. \12\ . we obtain 



J 



Lb 2 



exp 



3o- 2 



2kT A(kTf 



(regime B) (14) 



In both regimes, \w\ > a 2 /kT, thus a term of ln(J/J s ) 
constitutes a correction. The size of antenna grows with 
\w\, however unproductive non-specific adsorption of pro- 
teins on distant pieces of DNA, which can slow down the 



transport to the specific target grows with \w\ too. These 
two effects compete, as a result the rate enhancement 
J / J s grows with w in regime A and declines in regime B. 
On the other hand, growing a reduces the antenna size 
and promotes non-specific adsorption. Therefore, J / J s 
decreases with a in both regimes. 

The above theory deals with a macroscopic antenna. 
To be macroscopic, the antenna has to contain at least 
one site with energy around e pt- The number of sites 
n(e) with energy e within the antenna is of the order 
of ~ (A/6)exp[— (e — w) 2 /2a 2 ]. Thus a macroscopic 
antenna requires n(e op t) > 1, which gives A > A c = 
6exp[cr 2 /2(fcT) 2 ]. Since we know A from Eq. (fTTTfl . this 
condition can be written explicitly as \w\ > 3a 2 /2kT. 
Hence, \w\ = 3a 2 /2kT is the border between the macro- 
scopic regimes (A, B) and mesoscopic regimes (C, D) in 
Fig. □ We can check that when \w\ > 3a 2 /2kT, the 
condition e opt < is satisfied for the case of macroscopic 
antenna. Now we are ready to switch to the case of meso- 
scopic antenna and explain regimes C and D. 

Mesoscopic antenna — In this case, the upper limit of 
the integral in Eq. JBJ) should be replaced by <C e opt 
which is the largest energy typically available within the 
antenna. It can be estimated from n(e\) ~ 1, it is e\ ~ 
w + v / 2o"\/ln(A/&). Using w and e^, we can estimate the 
sum in Eq. @ and get typical ID current for the case of 
mesoscopic antenna: 



Ji(A) ~ £> 3 c 3 6exp 



M 
kT 



V21n(A/6) 



kT 



(15) 



Eq. (|15fl is apparently different from Eq. (JSJ) valid for the 
macroscopic antenna. This difference is partially related 
to the rate enhancement of ID diffusion at small time 
scale noticed for the Gaussian disorder in computer sim- 
ulations [12j. Equating Ji(A) to J 3 ~ _D 3 c 3 A, we obtain 



4 



ln ft) 



while for strong adsorption we have regime D where 
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FIG. 3: Dependence of antenna length A on the disorder 
strength a. Dashed lines represent the asymptotic limits. 




FIG. 4: Schematic plot of the dependencies of the rate en- 
hancement J/Js on \w\ at er = ai (upper solid curve) and 
a = (72 (lower solid curve). Letters A, B, C, D represent the 
domains of Fig|5| they go through. Dashed line shows the 
limit case of the flat energy profile with a = 0. 



the antenna length 
A ~ b exp 




kT 2(kT) 2 ^2kT 



(16) 



We can check, with this A, that the condition e\ < 
still holds. When \w\ < a 2 /2kT, the antenna length 
A - 6exp(w 2 /2cr 2 ). For a given adsorption energy w, 
dependence X(a) is plotted in Fig. OH It shows that 
the decrease of the antenna length with growing disorder 
strength slows down when antenna becomes mesoscopic. 

The crossover from a relatively weak adsorption to a 
strong one described by Eqs. l(T2j l again leads to the two 
scaling regimes for the case of mesoscopic antenna. They 
are labeled C and D in the diagram Fig. For relatively 
weak adsorption, when \w\ < a 2 /kT, we obtain regime 
C, where 



J 

— ~ exp 



W 

2a 2 



— , (regime C) 



(17) 



J_ 



Lb 2 



exp 



kT 2{kT) 2 



(regime D) (18) 



In experiment, the adsorption energy w can be con- 
trolled by the salt concentration changing the Coulomb 
part of protein-DNA interaction 01 ■ The dependencies 
of ln(J/J s ) on |iu| at the two specified values of disor- 
der strength <j\ and oi marked in Fig. [21 are schemat- 
ically plotted in Fig. 0] For comparison, we also plot- 
ted the case of the flat energy profile (a = 0). In both 
cases with a > 0, ln(J/J s ) first grows proportional to 
w 2 (regime C), because the antenna is mesoscopic and 
thus ID diffusion is faster, when compared to the nor- 
mal diffusion at macroscopic antenna. For a relatively 
small disorder a = oi, this rate enhancement continues 
to regime A but with a rate proportional to \w\ because 
the antenna grows to be macroscopic. For a larger disor- 
der a — a i , strong nonspecific adsorption of proteins on 
distant pieces of DNA slows down the search rate, when 
the antenna is still mesoscopic, and m(J/ J s ) decreases in 
regime D faster than it does in regime B. The antenna in 
regime B is macroscopic and ln(J/J s ) decreases propor- 
tional to \w\ for both a = a\ and a = ai- 

The crossover from the weak disorder to the strong one 
happens at a - er = kT [(1/2) \n(v / Lb 2 )] 1 / 2 (see Fig. 01. 
If one plugs in the achievable experimental conditions 
with Ljb ~ 150 and v ~ L 3 , estimate of gq is the order 
of 2kT, which falls in the range of estimates of a from 
IkT to 6kT used in the Refs. [lllllOil. A PP aren % ° 
grows for proteins with larger number of contacts with 
DNA and Co decreases with DNA concentration. In order 
to identify the role of strong disorder, we look forward 
to more experiments dealing with relatively large concen- 
trations of short straight DNA to guarantee that disorder 
strength satisfies a > ag. 

We know only one observation 17] of the peak in the 
coordinates of Fig. 0|but for a long and definitely coiled 
DNA for which our theory is not directly applicable. In- 
deed, in this paper, we concentrated on the case of rela- 
tively short and, therefore, straight DNA. In our recent 
paper [j| , we presented a general theory including Gaus- 
sian coiled and globular DNA in the absence of disorder. 
In current paper, we did not touch these cases because of 
our prejudice that simple questions should be addressed 
first. We concentrated on the simplest regimes labeled A 
and D in figure 4a of Ref. [9( and still got rather compli- 
cated diagram Fig. 0] . That is why we did not try to 
present our theory for more complicated regimes here. 

We are grateful to A.Yu. Grosberg, S.D. Baranovskii 
and J. Zhang for useful discussions. 
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